VARIANCE CONSTRAINED MARKOV DECISION PROCESS
نویسندگان
چکیده
منابع مشابه
Constrained Markov Decision Process and Optimal Policies
In the course lectures, we have discussed a lot regarding unconstrained Markov Decision Process (MDP). The dynamic programming decomposition and optimal policies with MDP are also given. However, in this report we are going to discuss a different MDP model, which is constrained MDP. There are many realistic demand of studying constrained MDP. For instance, in the wireless sensors networks, each...
متن کاملConstrained Markov Decision Processes
2 i To Tania and Einat ii Preface In many situations in the optimization of dynamic systems, a single utility for the optimizer might not suuce to describe the real objectives involved in the sequential decision making. A natural approach for handling such cases is that of optimization of one objective with constraints on other ones. This allows in particular to understand the tradeoo between t...
متن کاملQuantile Markov Decision Process
In this paper, we consider the problem of optimizing the quantiles of the cumulative rewards of Markov Decision Processes (MDP), to which we refers as Quantile Markov Decision Processes (QMDP). Traditionally, the goal of a Markov Decision Process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly to be infinite). In many applications, however, a decision maker may ...
متن کاملDenumerable Constrained Markov Decision Problems and Finite Approximations Denumerable Constrained Markov Decision Problems and Finite Approximations
The purpose of this paper is two fold. First to establish the Theory of discounted constrained Markov Decision Processes with a countable state and action spaces with general multi-chain structure. Second, to introduce nite approximation methods. We deene the occupation measures and obtain properties of the set of all achievable occupation measures under the diierent admissible policies. We est...
متن کاملMean-Variance Optimization in Markov Decision Processes
We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for oth...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the Operations Research Society of Japan
سال: 1987
ISSN: 0453-4514,2188-8299
DOI: 10.15807/jorsj.30.88